Locally Linear Embedding (LLE) — Manifold Learning (From Scratch)#

Locally Linear Embedding (LLE) is a nonlinear dimensionality reduction method that tries to “unfold” a manifold by preserving how each point can be reconstructed from its nearest neighbors.

Learning goals#

  • Build intuition for why LLE works (local geometry → global coordinates)

  • Derive the local reconstruction weights and the global embedding objective

  • See how LLE becomes an eigenvalue problem

  • Implement standard LLE from scratch in NumPy

  • Visualize neighborhood reconstruction + weight preservation with Plotly

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

from plotly.subplots import make_subplots

from sklearn.datasets import make_swiss_roll
from sklearn.manifold import LocallyLinearEmbedding
from sklearn.preprocessing import StandardScaler

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)

rng = np.random.default_rng(42)

Prerequisites#

  • Linear algebra basics (matrix solve, eigenvectors)

  • K-nearest neighbors (KNN)

  • A mental model of a manifold: high-dimensional data that is “locally simple”

1) Intuition#

“Rebuilding each point from its neighbors”

LLE starts with a very local question:

  • If I look only at the K nearest neighbors of a point, can I express that point as a linear combination of those neighbors?

Think of each neighborhood as a tiny patch of the manifold. Even if the manifold is curved globally, small patches are often close to flat.

“Local recipes that must still work globally”

Once we learn a “recipe” (the neighbor weights) for each point in the original space, LLE searches for low-dimensional coordinates where the same recipes still reconstruct the points. The embedding is the set of coordinates that makes all of those local recipes consistent at once.

# A small 2D intuition picture: one point + its neighbors
n_curve = 240
theta = np.linspace(0, 2 * np.pi, n_curve, endpoint=False)
X_curve = np.c_[np.cos(theta), np.sin(theta)] + 0.03 * rng.normal(size=(n_curve, 2))

i0 = 40
k0 = 10
dist2 = np.sum((X_curve - X_curve[i0]) ** 2, axis=1)
nn0 = np.argsort(dist2)[1 : k0 + 1]

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=X_curve[:, 0],
        y=X_curve[:, 1],
        mode="markers",
        marker=dict(size=6, color="rgba(120,120,120,0.35)"),
        name="data",
    )
)
fig.add_trace(
    go.Scatter(
        x=X_curve[nn0, 0],
        y=X_curve[nn0, 1],
        mode="markers",
        marker=dict(size=10, color="#1f77b4"),
        name=f"{k0} nearest neighbors",
    )
)
fig.add_trace(
    go.Scatter(
        x=[X_curve[i0, 0]],
        y=[X_curve[i0, 1]],
        mode="markers",
        marker=dict(size=14, color="black", symbol="star"),
        name="target point",
    )
)

# Light “spokes” to show the neighborhood
for j in nn0:
    fig.add_trace(
        go.Scatter(
            x=[X_curve[i0, 0], X_curve[j, 0]],
            y=[X_curve[i0, 1], X_curve[j, 1]],
            mode="lines",
            line=dict(color="rgba(31,119,180,0.25)", width=2),
            showlegend=False,
        )
    )

fig.update_layout(
    title="Intuition: neighborhoods are small patches of the manifold",
    xaxis_title="x₁",
    yaxis_title="x₂",
    width=750,
    height=520,
)
fig.show()

2) Mathematical Explanation#

2.1 Local linear reconstruction weights#

Let the dataset be points \(x_1, \dots, x_n\) with \(x_i \in ℝ^D\).

For each point \(x_i\), pick its neighbor index set \(N(i)\) (usually KNN). LLE chooses weights \(w_{ij}\) (only for \(j \in N(i)\)) that minimize the local reconstruction error:

\[ min_{w_i} \; \left\|x_i - \sum_{j \in N(i)} w_{ij} x_j\right\|_2^2 \quad \text{s.t.} \quad \sum_{j \in N(i)} w_{ij} = 1 \]

The sum-to-one constraint makes the weights invariant to translations (shifting all points by the same vector).

A common way to compute \(w_i\):

  • Build the neighbor difference matrix \(Z_i \in ℝ^{K \times D}\) with rows \(x_j - x_i\) for \(j \in N(i)\).

  • Form the local covariance (Gram) matrix \(C_i = Z_i Z_i^T \in ℝ^{K \times K}\).

  • Solve \(C_i w_i = 1\) and normalize \(w_i\) so its entries sum to 1.

  • Add a small regularization term to \(C_i\) for numerical stability (especially when neighbors are nearly collinear).

2.2 Global cost function (preserve the weights)#

After we have weights, we search for low-dimensional coordinates \(y_i \in ℝ^d\) that preserve those same reconstructions:

\[ min_Y \; \sum_{i=1}^n \left\|y_i - \sum_{j=1}^n W_{ij} y_j\right\|_2^2 \]

Here \(W\) is an \(n \times n\) matrix where \(W_{ij}\) is nonzero only if \(j \in N(i)\).

Write the objective in matrix form (with \(Y \in ℝ^{n \times d}\)):

\[ \Phi(Y) = \|(I - W)Y\|_F^2 = Tr\left(Y^T M Y\right), \quad M = (I - W)^T (I - W) \]

To avoid the trivial solution, we constrain \(Y\) (e.g., center it and fix its scale). Minimizing the trace under those constraints yields an eigenvalue problem:

\[ Mv = \lambda v \]

The embedding \(Y\) is formed from the eigenvectors with the smallest non-zero eigenvalues (skipping the smallest one, whose eigenvector is the constant vector).

3) Step-by-Step Algorithm#

  1. Neighbor selection

    • For each point \(x_i\), find the indices of its \(K\) nearest neighbors \(N(i)\).

  2. Weight computation (local step)

    • For each point \(x_i\), solve for weights \(w_i\) that reconstruct \(x_i\) from \(X[N(i)]\).

    • Enforce \(\sum_j w_{ij} = 1\).

  3. Global embedding

    • Build the sparse weight matrix \(W\).

    • Form \(M = (I - W)^T(I - W)\).

    • Compute the bottom eigenvectors of \(M\) and take the smallest non-trivial ones as the embedding coordinates.

def pairwise_squared_distances(X: np.ndarray) -> np.ndarray:
    """Compute dense pairwise squared Euclidean distances.

    Returns an (n, n) matrix with `np.inf` on the diagonal.
    """
    X = np.asarray(X, dtype=float)
    if X.ndim != 2:
        raise ValueError("X must be a 2D array of shape (n_samples, n_features)")

    n = X.shape[0]
    if n < 2:
        raise ValueError("X must contain at least 2 samples")

    X_norm = np.sum(X**2, axis=1, keepdims=True)
    dist2 = X_norm + X_norm.T - 2.0 * (X @ X.T)
    dist2 = np.maximum(dist2, 0.0)
    np.fill_diagonal(dist2, np.inf)
    return dist2


def knn_indices_from_dist2(dist2: np.ndarray, k: int) -> np.ndarray:
    """Return (n, k) neighbor indices given a pairwise distance matrix."""
    dist2 = np.asarray(dist2, dtype=float)
    if dist2.ndim != 2 or dist2.shape[0] != dist2.shape[1]:
        raise ValueError("dist2 must be a square (n,n) distance matrix")

    n = dist2.shape[0]
    if not (1 <= k < n):
        raise ValueError(f"k must satisfy 1 <= k < n (got k={k}, n={n})")

    idx = np.argpartition(dist2, kth=k - 1, axis=1)[:, :k]
    rows = np.arange(n)[:, None]
    order = np.argsort(dist2[rows, idx], axis=1)
    idx = idx[rows, order]
    return idx


def lle_local_weights(X: np.ndarray, neighbors: np.ndarray, reg: float = 1e-3) -> np.ndarray:
    """Compute LLE reconstruction weights for each point.

    Returns weights with shape (n, k), aligned with `neighbors`.
    """
    X = np.asarray(X, dtype=float)
    neighbors = np.asarray(neighbors)

    if X.ndim != 2:
        raise ValueError("X must be a 2D array of shape (n_samples, n_features)")
    if neighbors.ndim != 2:
        raise ValueError("neighbors must be a 2D integer array of shape (n, k)")

    n, d = X.shape
    if neighbors.shape[0] != n:
        raise ValueError("neighbors must have the same number of rows as X")

    k = neighbors.shape[1]
    if not (1 <= k < n):
        raise ValueError(f"neighbors k must satisfy 1 <= k < n (got k={k}, n={n})")
    if np.any(neighbors < 0) or np.any(neighbors >= n):
        raise ValueError("neighbors contains out-of-range indices")

    weights = np.zeros((n, k), dtype=float)
    ones = np.ones(k, dtype=float)

    for i in range(n):
        Ni = neighbors[i]
        Z = X[Ni] - X[i]  # (k, d)
        C = Z @ Z.T        # (k, k)

        tr = np.trace(C)
        C += (reg * tr if tr > 0 else reg) * np.eye(k)

        try:
            w = np.linalg.solve(C, ones)
        except np.linalg.LinAlgError:
            w = np.linalg.lstsq(C, ones, rcond=None)[0]

        w_sum = np.sum(w)
        if np.isclose(w_sum, 0.0):
            w = ones / k
        else:
            w /= w_sum
        weights[i] = w

    return weights


class ScratchLLE:
    def __init__(self, n_neighbors: int = 12, n_components: int = 2, reg: float = 1e-3):
        self.n_neighbors = int(n_neighbors)
        self.n_components = int(n_components)
        self.reg = float(reg)

    def fit(self, X: np.ndarray):
        X = np.asarray(X, dtype=float)
        if X.ndim != 2:
            raise ValueError("X must be a 2D array of shape (n_samples, n_features)")

        n = X.shape[0]
        if n < 2:
            raise ValueError("X must contain at least 2 samples")
        if not (1 <= self.n_components < n):
            raise ValueError(
                f"n_components must satisfy 1 <= n_components < n (got {self.n_components}, n={n})"
            )
        if not (1 <= self.n_neighbors < n):
            raise ValueError(
                f"n_neighbors must satisfy 1 <= n_neighbors < n (got {self.n_neighbors}, n={n})"
            )

        dist2 = pairwise_squared_distances(X)
        neighbors = knn_indices_from_dist2(dist2, self.n_neighbors)
        weights = lle_local_weights(X, neighbors, reg=self.reg)

        W = np.zeros((n, n), dtype=float)
        rows = np.arange(n)[:, None]
        W[rows, neighbors] = weights

        M = np.eye(n) - W
        M = M.T @ M

        evals, evecs = np.linalg.eigh(M)
        Y = evecs[:, 1 : self.n_components + 1]

        self.embedding_ = Y
        self.neighbors_ = neighbors
        self.weights_ = weights
        self.eigenvalues_ = evals
        return self

    def fit_transform(self, X: np.ndarray) -> np.ndarray:
        return self.fit(X).embedding_


def lle_from_scratch(
    X: np.ndarray,
    n_neighbors: int = 12,
    n_components: int = 2,
    reg: float = 1e-3,
) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
    """Convenience wrapper around `ScratchLLE` (standard LLE)."""
    model = ScratchLLE(n_neighbors=n_neighbors, n_components=n_components, reg=reg).fit(X)
    return model.embedding_, model.neighbors_, model.weights_, model.eigenvalues_

4) Plotly Visualizations#

We’ll use a classic benchmark: the Swiss roll. It’s a 2D surface embedded in 3D.

4.1 Swiss roll → 2D embedding#

4.2 Local neighborhood reconstruction#

4.3 Weight preservation illustration#

4.4 Compare with sklearn (sanity check)#

n_samples = 900
X_swiss, t = make_swiss_roll(n_samples=n_samples, noise=0.06, random_state=42)
X_swiss = StandardScaler().fit_transform(X_swiss)

n_neighbors = 12
n_components = 2

scratch_lle = ScratchLLE(n_neighbors=n_neighbors, n_components=n_components, reg=1e-3).fit(X_swiss)
Y_scratch = scratch_lle.embedding_
neighbors_swiss = scratch_lle.neighbors_
weights_swiss = scratch_lle.weights_
evals = scratch_lle.eigenvalues_

X_hat = np.sum(weights_swiss[..., None] * X_swiss[neighbors_swiss], axis=1)
recon_err = np.linalg.norm(X_swiss - X_hat, axis=1)
print(
    f"Local reconstruction error: mean={recon_err.mean():.4f}, "
    f"95th%={np.percentile(recon_err, 95):.4f}"
)

fig = px.histogram(recon_err, nbins=40, title="Local reconstruction errors (||x - x̂||)")
fig.update_layout(xaxis_title="||x - x̂||", yaxis_title="count", width=850, height=420)
fig.show()

fig = px.scatter_3d(
    x=X_swiss[:, 0],
    y=X_swiss[:, 1],
    z=X_swiss[:, 2],
    color=t,
    title="Swiss roll in 3D (color = intrinsic parameter)",
    labels={"x": "x₁", "y": "x₂", "z": "x₃"},
)
fig.update_traces(marker=dict(size=3, opacity=0.8))
fig.show()

fig = px.scatter(
    x=Y_scratch[:, 0],
    y=Y_scratch[:, 1],
    color=t,
    title=f"LLE embedding from scratch (K={n_neighbors}, d={n_components})",
    labels={"x": "y₁", "y": "y₂"},
)
fig.update_traces(marker=dict(size=5, opacity=0.9))
fig.show()
Local reconstruction error: mean=0.0053, 95th%=0.0124

4.2 Local neighborhood reconstruction#

For one point \(x_i\), we can visualize:

  • its neighbor set \(N(i)\)

  • the reconstructed point \(\hat{x}_i = \sum_{j \in N(i)} w_{ij} x_j\)

The reconstruction should be close in the original space if the neighborhood is locally “flat enough”.

def plot_local_reconstruction_3d(
    X: np.ndarray,
    t: np.ndarray,
    neighbors: np.ndarray,
    weights: np.ndarray,
    i: int,
) -> go.Figure:
    Ni = neighbors[i]
    wi = weights[i]
    x_i = X[i]
    x_hat = wi @ X[Ni]
    err = float(np.linalg.norm(x_i - x_hat))

    fig = go.Figure()
    fig.add_trace(
        go.Scatter3d(
            x=X[:, 0],
            y=X[:, 1],
            z=X[:, 2],
            mode="markers",
            marker=dict(size=2, color="rgba(140,140,140,0.25)"),
            name="all points",
        )
    )
    fig.add_trace(
        go.Scatter3d(
            x=X[Ni, 0],
            y=X[Ni, 1],
            z=X[Ni, 2],
            mode="markers",
            marker=dict(
                size=7,
                color=wi,
                colorscale="RdBu",
                reversescale=True,
                colorbar=dict(title="weight"),
            ),
            name="neighbors (colored by weight)",
        )
    )
    fig.add_trace(
        go.Scatter3d(
            x=[x_i[0]],
            y=[x_i[1]],
            z=[x_i[2]],
            mode="markers",
            marker=dict(size=9, color="black", symbol="diamond"),
            name=f"x[{i}]",
        )
    )
    fig.add_trace(
        go.Scatter3d(
            x=[x_hat[0]],
            y=[x_hat[1]],
            z=[x_hat[2]],
            mode="markers",
            marker=dict(size=9, color="#d62728", symbol="x"),
            name="reconstruction (x̂)",
        )
    )
    fig.add_trace(
        go.Scatter3d(
            x=[x_i[0], x_hat[0]],
            y=[x_i[1], x_hat[1]],
            z=[x_i[2], x_hat[2]],
            mode="lines",
            line=dict(color="#d62728", width=6),
            showlegend=False,
        )
    )

    fig.update_layout(
        title=f"Local reconstruction in 3D (i={i}, ||x - x̂||={err:.3e})",
        scene=dict(xaxis_title="x₁", yaxis_title="x₂", zaxis_title="x₃"),
        width=900,
        height=620,
    )
    return fig


i_demo = 250
plot_local_reconstruction_3d(X_swiss, t, neighbors_swiss, weights_swiss, i_demo).show()

4.3 Weight preservation illustration#

In the embedding, LLE tries to make each point satisfy the same reconstruction:

\[ y_i \approx \sum_{j \in N(i)} w_{ij} y_j \]

Below, we apply the same weights to neighbors in both the original space and the embedding.

def plot_weight_preservation(
    X: np.ndarray,
    Y: np.ndarray,
    neighbors: np.ndarray,
    weights: np.ndarray,
    i: int,
) -> go.Figure:
    Ni = neighbors[i]
    wi = weights[i]

    x_i = X[i]
    x_hat = wi @ X[Ni]
    err_x = float(np.linalg.norm(x_i - x_hat))

    y_i = Y[i]
    y_hat = wi @ Y[Ni]
    err_y = float(np.linalg.norm(y_i - y_hat))

    fig = make_subplots(
        rows=1,
        cols=2,
        specs=[[{"type": "scene"}, {"type": "xy"}]],
        subplot_titles=(
            f"Original space (||x - x̂||={err_x:.3e})",
            f"Embedding space (||y - ŷ||={err_y:.3e})",
        ),
    )

    # Original space (3D)
    fig.add_trace(
        go.Scatter3d(
            x=X[:, 0],
            y=X[:, 1],
            z=X[:, 2],
            mode="markers",
            marker=dict(size=2, color="rgba(140,140,140,0.22)"),
            showlegend=False,
        ),
        row=1,
        col=1,
    )
    fig.add_trace(
        go.Scatter3d(
            x=X[Ni, 0],
            y=X[Ni, 1],
            z=X[Ni, 2],
            mode="markers",
            marker=dict(size=7, color=wi, colorscale="RdBu", reversescale=True),
            name="neighbors",
        ),
        row=1,
        col=1,
    )
    fig.add_trace(
        go.Scatter3d(
            x=[x_i[0]],
            y=[x_i[1]],
            z=[x_i[2]],
            mode="markers",
            marker=dict(size=9, color="black", symbol="diamond"),
            name="x",
        ),
        row=1,
        col=1,
    )
    fig.add_trace(
        go.Scatter3d(
            x=[x_hat[0]],
            y=[x_hat[1]],
            z=[x_hat[2]],
            mode="markers",
            marker=dict(size=9, color="#d62728", symbol="x"),
            name="x̂",
        ),
        row=1,
        col=1,
    )

    # Embedding space (2D)
    fig.add_trace(
        go.Scatter(
            x=Y[:, 0],
            y=Y[:, 1],
            mode="markers",
            marker=dict(size=6, color="rgba(140,140,140,0.25)"),
            showlegend=False,
        ),
        row=1,
        col=2,
    )
    fig.add_trace(
        go.Scatter(
            x=Y[Ni, 0],
            y=Y[Ni, 1],
            mode="markers",
            marker=dict(size=11, color=wi, colorscale="RdBu", reversescale=True, showscale=True),
            name="neighbors (weights)",
        ),
        row=1,
        col=2,
    )
    fig.add_trace(
        go.Scatter(
            x=[y_i[0]],
            y=[y_i[1]],
            mode="markers",
            marker=dict(size=14, color="black", symbol="diamond"),
            name="y",
        ),
        row=1,
        col=2,
    )
    fig.add_trace(
        go.Scatter(
            x=[y_hat[0]],
            y=[y_hat[1]],
            mode="markers",
            marker=dict(size=14, color="#d62728", symbol="x"),
            name="ŷ",
        ),
        row=1,
        col=2,
    )

    fig.update_layout(
        title=f"Weight preservation (same neighborhood weights, i={i})",
        width=1100,
        height=520,
    )
    fig.update_scenes(xaxis_title="x₁", yaxis_title="x₂", zaxis_title="x₃", row=1, col=1)
    fig.update_xaxes(title_text="y₁", row=1, col=2)
    fig.update_yaxes(title_text="y₂", row=1, col=2)
    return fig


plot_weight_preservation(X_swiss, Y_scratch, neighbors_swiss, weights_swiss, i_demo).show()

4.4 Compare with sklearn (sanity check)#

Different implementations can produce embeddings that are rotated/flipped/scaled differently, but the geometry should match.

lle = LocallyLinearEmbedding(
    n_neighbors=n_neighbors,
    n_components=n_components,
    method="standard",
    random_state=42,
)
Y_sklearn = lle.fit_transform(X_swiss)

fig = px.scatter(
    x=Y_sklearn[:, 0],
    y=Y_sklearn[:, 1],
    color=t,
    title="`sklearn` LocallyLinearEmbedding (standard)",
    labels={"x": "y₁", "y": "y₂"},
)
fig.update_traces(marker=dict(size=5, opacity=0.9))
fig.show()

5) Strengths & Weaknesses#

Strengths#

  • Captures nonlinear structure while preserving local geometry.

  • Works well when data lies on a smooth manifold and neighborhoods are well-sampled.

  • No gradient descent needed for the standard method (solve local linear systems + an eigenproblem).

Weaknesses#

  • Manifold assumptions: if the data is not locally linear (or not a manifold), LLE can distort structure.

  • Noise sensitivity: noisy neighbors → unstable weights → poor embeddings.

  • Choice of K is critical: too small → disconnected graph; too large → “short-circuits” across the manifold.

  • Global step needs an eigen-decomposition (can be expensive for large \(n\) without sparse solvers).

Exercises#

  1. Try different values of n_neighbors (e.g., 6, 12, 24). When does the embedding break?

  2. Add more noise to the Swiss roll. How do the local reconstruction errors change?

  3. Replace the brute-force neighbor search with sklearn.neighbors.NearestNeighbors and compare runtime.

References#

  • Roweis & Saul (2000), Nonlinear Dimensionality Reduction by Locally Linear Embedding.

  • sklearn.manifold.LocallyLinearEmbedding documentation.